Optimal Data Migration in Self-managing Storage Systems

نویسندگان

  • Gaurav Veda
  • Kaushik Lakshminarayanan
چکیده

The need for petascale storage and beyond has led to new storage architectures which are scalable, inexpensive and efficient, replacing large monolithic disk array systems. Cluster-based storage systems ([3], [5]), comprising of a large collection of small, inexpensive and unreliable storage nodes (storage bricks), are increasingly becoming popular. In such systems, data is distributed redundantly among the bricks in order to improve reliability and performance. Using a specialized data encoding scheme depending on the workload and fault tolerance requirements, can lead to substantial performance benefits as compared to a “one size fits all” model ([3]). Therefore, it is desired that these cluster-based systems provide versatility to specialize the data distribution choices for various classes of data and their workloads. For example, a file accessed sequentially should be erasure coded to reduce the amount of space needed, while one requiring lots of random accesses should be replicated. Due to the inherent complexity of distributed systems, plus the additional complexity introduced by the versatility of these systems, makes managing such systems extremely difficult. Thus, we desire such systems to be self-managing. In particular, the system should itself decide what the best data encoding for a particular storage object is. When a new storage object is introduced in the system, it can be stored using some encoding scheme. As the workload and usage characteristics for the object become known, the system should change the encoding of the object to one that achieves the best performance and the required level of fault tolerance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards a QoS-aware Virtualised Storage System

Every organisation depends critically on reliable high-performance storage. Driven by the high costs of maintaining and managing multiple local storage systems, there is a trend towards virtualised multi-tier storage infrastructures. The main limitation of such centralised solutions is their inability to guarantee application-level Quality of Service (QoS) without extensive and ongoing human in...

متن کامل

A Non-MDS Erasure Code Scheme for Storage Applications

This paper investigates the use of redundancy and self repairing against node failures indistributed storage systems using a novel non-MDS erasure code. In replication method, accessto one replication node is adequate to reconstruct a lost node, while in MDS erasure codedsystems which are optimal in terms of redundancy-reliability tradeoff, a single node failure isrepaired after recovering the ...

متن کامل

Automated Storage Management with QoS Guarantee in Large-scale Virtualized Storage Systems

Storage virtualization in modern storage systems allows variability in the number of “physical” disks supporting a single “virtual” disk. In practice, IO workloads vary with time. Presuming the evolution of reasonable predictive models with the power of accurately predicting IO workload, it may be argued that it is straightforward to compute the number of disks needed at any time to satisfy QoS...

متن کامل

Proactive and Adaptive Data Migration in Hierarchical Storage Systems using Reinforcement Learning Agent

With the data generation rates growing exponentially, businesses are having a difficult time maintaining data center infrastructure. Hierarchical storage systems has evolved as a better alternate to managing data, as frequently accessed data is placed on higher tiers and the least frequently accessed data on lower tiers. But the data arrangement is not always static. Data Migration is an operat...

متن کامل

To move or not to move: Cost optimization in a dual cloud-based storage architecture

IT enterprises have recently witnessed a dramatic increase in data volume and faced with challenges of storing and retrieving their data. Thanks to the fact that cloud infrastructures offer storage and network resources in several geographically dispersed data centers (DCs), data can be stored and shared in scalable and highly available manner with little or no capital investment. Due to divers...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007